Baltimore: A City of Neighborhoods

By: Martin Jauquet
Final Project - CMSC320 - Spring 2021 - Jose Calderon

drawing


Introduction

Baltimore City, the greatest city in America (according to the benches), has a rich history and a unique culture. Throughout its history, Baltimore has attracted many migrants from the Europeans at the turn of the 20th century to more current Middle Eastern and Latino demographics. These groups kept their traditions alive in their own neighborhoods in the City: Little Italy for the Italians, Greektown for the Greeks, and Highlandtown for Latinos to name a few. A strong sense of neighborhood pride and loyalty has developed that many Baltimoreans refer to their neighborhood when being asked where they are from.

In recent years, Baltimore has received considerable national attention, albeit, not always in a good way. Crime, vacant houses, and failing schools are often what people think about when they think of Baltimore. Some neighborhoods have it worse than others, but why? Is there one factor that leads to higher crime and lower graduation rates? Or is it a combination of everything which makes it hard for some of these neighborhoods to have a decent standard of living?

In this tutorial, we will look at some of the data relating to income, high school completion, vacant buildings, and crime for the neighborhoods in the city. We will do some analysis to determine if there is a correlation between any of these. Lastly, we will create an interactive map so it is easier to visualize the data.

Data Collection

First, we will be getting some data from Vital Signs Open Data Portal, an open data portal for Baltimore. They have tons of data broken up into various geographic regions. For this tutorial, we will be exploring data that is in Community Statistical Area(CSA), which is a group of neighborhoods with similar characteristics. We will be getting the following data sets: High School Completion Rate, High School Completion Rate, Part 1 Crime Rate per 1,000 Residents, Percentage of Residential Properties that are Vacant and Abandoned, and Median Household Income.

On each site, find the download dropdown and download the spreadsheet to your local folder. Each site also has a description of the dataset. These are some indicators of the well being of the neighborhood.

We will then open eash CSV and read it into a pandas dataframe, remove columns that we don't need, and rename other columns for readability.

Tidying Our Data

Since not all the tables have data for each year, we will only look at the years 2010-2017.

Now that we have collected our data, let's tidy it up. We'll create a new column called "Year" and another column for the variable of each table. This is called melting. In the end, each table will have one row per entry for one measured variable. This makes it easier to do an analysis later on.

Now that the data is in an organized form, we will merge all the data into a single table. We will be using a table merging using a Left Join approach. This will return all the rows for which a value in the Left dataframe has a value in the right dataframe.

For example, this first merge takes the "CSA" in the income_melt table and sees if there is any data in the edu_melt table that has the same "CSA". If it does, then that new value is added as a column to the left table (in this case income_melt)

Data Visualization and Analysis

At last, we have all of our data in one table. Now that it is in a clean and readable format, let's visualize it and see if we find any trends.

Boxplots

First, let's just look some boxplots for each data set. Boxplots display the distribution for a column using the min, max, median, first quartile, and third quartile. This will help us get a general idea of the distribution is like for each variable. We will also be able to see any outliers that we may want to remove later on.

There is a slight skew here to higher income and more outliers appear in recent years.

This has a more normal distribution, yet there are still a good number of outliers on both ends.

There are some extreme outliers here that will certainly impact the data. We need to be carefull with how we analyze them later on.

There are a decent number of outliers here which is skewing the data as you can see by the extended whisker. It will be interesting to see how much of an impact those data points have on our linear models.

Linear Regression

In the next section, we will try to determine if there are any relations between different variables. We will do this through scatter plots and some linear regression. We will be using sklearn for most of the plots.

A little bit about linear regression before we go into the plots. Linear regressions (you may know them as lines of best fit) creates a model that is the best represenation to the data. A linear relationship does not necessarily mean there is a causation, just that there is a correlation(2 things are related). If we have a positive correlation, then the line is increasing. If it is a negative correlation, the line is decreasion. We say that there is a strong linear relaionship if the slope is close to 1.

First, we will be exploring how income correlates with High School Completion Rate, Crime Rate, and Vacant Buildings Rate. Income is often a huge factor in the quality of life, so we will be seeing how income levels relate to the other variables.

For this plot, there is a slight positive correlation which means that income and HS completion is some what related.

Here we have a slight negative correlation. This makes sense since most wealthy communites have a lot less crime than poorer communitites. There are some outliers which have a significant crime rate.

For the following plot, we will try adjust our linear regression to fit an exponential function. Since the data drops fairly sharply and then levels out, it looks as though an exponential model could fit.

Well, it doesn't look like the exponential model quite worked. Using the regular way gives us a very odd plot which makes it hard to read. Here, the exponential model does not do well with larger numbers and does not provide helpful insight either.

In the last two plots, we will see how Vacancy rate and crime rate have an impact on High School completion rate. Graduating high school opens the doors to many oportunities and is a strong indicator of the wellbeing of a community.

We have a slight negative correlation here. The data is too cluttered towards the left, it is hard to tell how strong or a correlation vacant houses in a neighborhood has with HS completion rate.

Lastly, we looked a Crime vs. High School completion rate. This plot does not tell us very much and there is a lot of noise (interference) by the data points to get a clear linear model.

Looking at the averages

Now, let's look at the average value across the 8 years for each CSA. This will hopefully help us gain some better insights and remove extra noise from our models.

We can see there is a positive correlation betwen median income and high school completion. This is reasonable, since people with more income can probably afford to go to a good school and receive extra help to pass compared to those who don't have expendable income.

Again, here we have a negative correlation between these two variables. Higher income often tends to bring more security and less crime happens in those types of neighborhoods.

In the next plot, we will again attempt to add a exponential model.

Unfortunately, the model did not turn out as expected. This could also be because of the sensitiveity of the model to lower numbers.

Here we have a strong negative correlation between these two variables. Crime is certainly a barrier to a student's ability to learn. The more crime there is the less likely students are able to feel safe and comfortable enough to learn effectively.

Lastly, there is a stronger negative correlation between building vacancy and high school completion. Higher vacancy tends to mean more poverty which comes with a variety of barriers for young people to graduate.

Mapping

Mapping can be a great way to visualize data. Here we will map the averages for each variable in a Heat Map.

There are many different ways to map data. For example, we could have done it by Zip Code, Census Tracts, Census Blocks. If you are interested, the Census has descriptions for each geography. CSA was a geographic region the still kept neighborhood level data, but wasn't too specific that we would have more than necessary.

Through Folium, we will be mapping each CSA shape on a map of Baltimore City. This will allow us to look at different layers and see which CSAs have things in common.

First, we will get the shapes for each CSA from the same site as before. We will be downloading the shapefile. Make sure you unzip the entire directory into your local folder. We will also need to install geopandas to be able to convert the shapefile into a format that can be mapped.

The map brings to light some of the regions that have the worst crime rates, most vacant buildings, and lowest high school graduation rates. East and West Baltimore (the regions just East and West of Downtown) are unfortaunately know for their high crime and abandoned city blocks. We can see how the map depicts this with the darker shades of orange and purple. North Baltimore and South Baltimore are some of the wealthier neighborhoods in the City which we can see by the darker green. North Baltimore, especially, is where you'll find the well know private schools in the region. Overall, interactive maps like these can help us compare different regions and draw conclusions as to why certain variables are more prominent.

Conclusion

Determining the causes of Crime, Vacant buildings, and HS Completion rates is much more complicated than looking at some charts. The map helped us see where these problems are, but there are many socio-political factors that shape each neighborhood. While we were able to get some good insignt into the relationship between variables, each neighborhood is still just as valuable as any other. It does not matter what problems are going on where, just as long as we are working to solve them.

I love Baltimore. It is one of the greatest cities in the world. You have the Ravens, the Orioles, Old Bay, and Natty Boh. No matter what neighborhood you are in, everyone there can share the same love and pride for their city. Sure, there are some problems we need to solve, but I don't know if I could live anywhere else.

Thank you reading this analysis of the different neighborhoods of Baltimore. I hope you were able to learn something new. If you are interested in learing more about Baltimore and it's neighborhoods, check out this site